18 research outputs found
Predicting B Cell Receptor Substitution Profiles Using Public Repertoire Data
B cells develop high affinity receptors during the course of affinity
maturation, a cyclic process of mutation and selection. At the end of affinity
maturation, a number of cells sharing the same ancestor (i.e. in the same
"clonal family") are released from the germinal center, their amino acid
frequency profile reflects the allowed and disallowed substitutions at each
position. These clonal-family-specific frequency profiles, called "substitution
profiles", are useful for studying the course of affinity maturation as well as
for antibody engineering purposes. However, most often only a single sequence
is recovered from each clonal family in a sequencing experiment, making it
impossible to construct a clonal-family-specific substitution profile. Given
the public release of many high-quality large B cell receptor datasets, one may
ask whether it is possible to use such data in a prediction model for
clonal-family-specific substitution profiles. In this paper, we present the
method "Substitution Profiles Using Related Families" (SPURF), a penalized
tensor regression framework that integrates information from a rich assemblage
of datasets to predict the clonal-family-specific substitution profile for any
single input sequence. Using this framework, we show that substitution profiles
from similar clonal families can be leveraged together with simulated
substitution profiles and germline gene sequence information to improve
prediction. We fit this model on a large public dataset and validate the
robustness of our approach on an external dataset. Furthermore, we provide a
command-line tool in an open-source software package
(https://github.com/krdav/SPURF) implementing these ideas and providing easy
prediction using our pre-fit models.Comment: 23 page
A Bayesian phylogenetic hidden Markov model for B cell receptor sequence analysis.
The human body generates a diverse set of high affinity antibodies, the soluble form of B cell receptors (BCRs), that bind to and neutralize invading pathogens. The natural development of BCRs must be understood in order to design vaccines for highly mutable pathogens such as influenza and HIV. BCR diversity is induced by naturally occurring combinatorial "V(D)J" rearrangement, mutation, and selection processes. Most current methods for BCR sequence analysis focus on separately modeling the above processes. Statistical phylogenetic methods are often used to model the mutational dynamics of BCR sequence data, but these techniques do not consider all the complexities associated with B cell diversification such as the V(D)J rearrangement process. In particular, standard phylogenetic approaches assume the DNA bases of the progenitor (or "naive") sequence arise independently and according to the same distribution, ignoring the complexities of V(D)J rearrangement. In this paper, we introduce a novel approach to Bayesian phylogenetic inference for BCR sequences that is based on a phylogenetic hidden Markov model (phylo-HMM). This technique not only integrates a naive rearrangement model with a phylogenetic model for BCR sequence evolution but also naturally accounts for uncertainty in all unobserved variables, including the phylogenetic tree, via posterior distribution sampling
Effect of hot and cooled carbohydrate diet on glycemic response in healthy individuals: a cross over study
Background: Cooling of starch after cooking is known to cause starch retrogradation which increases resistant starch content. Resistant starch cannot be digested in the gut and acts as dietary fiber. The study aimsed to determine the effect of cooling of carbohydrate rich diet on glycemic response on healthy adults.Methods: The present study was a randomized, single blind, crossover study where 20 healthy subjects were selected. Two rice preparations were used, one freshly prepared hot, second, cooked and cooled at 4°C for 12 hours. All subjects were evaluated after giving both rice preparations separately with a crossover period of 7 days. Glycemic response was checked over a period of 2 hours at various time intervals using ACCU-CHEK® Active glucometer.Results: Glycemic response with cooled white rice was better in comparison to freshly prepared hot white rice at all time points (mean±SD, 121.9±17.4 vs 128.0± 22.1 mg/dl). However, the difference in means at 30 mins was maximum and statistically significant (p<0.001).Conclusions: Cooled white rice yields better glycemic response when consumed by healthy individuals possibly due to formation of resistant starch
Prevalence of hepatitis C in patients with chronic kidney disease at a tertiary care hospital in north India: a retrospective analysis
Background: Hepatitis C and chronic kidney disease (CKD) both present an unsolved public health problem Hepatitis C virus (HCV) is easily transmitted in haemodialysis units and by kidney transplantation. HCV leads to increased mortality and morbidity due to cirrhosis and hepatocellular carcinoma, while accelerating the progression of CKD. The aim of the study was to describe the demographic, clinical/biochemical profile and prevalence of patients with CKD who have HCV infection.Methods: This was a retrospective analysis of patients with CKD who presented to out/in patient department of medicine in a tertiary care center in Jammu from a period of Feb 2016 to Nov 2018. Detailed clinical history along with previous lab reports were noted and tests for HCV infection were conducted in all patients. Diagnosis of HCV was made via HCV RNA(RT PCR) and positive Anti HCV IgG serology.Results: Total 67 patients were included with median age of 54 years (range 43-72 years) with majority 76.1% being males, and 71.6% within 41-60 years age group. 31.4% were HCV positive out of which 81% were males. 7 patients were found to have co-infection with HIV and HBsAg. Genotype 1 (72%) was found to be more common than Genotype 3. Ultrasonography and Upper GI endoscopy showcased 57% with dilated spleenoportal axis and oesophageal varices respectively.Conclusions: Prevalence of HCV infection in CKD patients is high with genotype 1 being commonest. False negative Anti HCV antibody is common hence screening with HCV RNA is recommended. Strict universal precautions should be employed in hospitals and dialysis units to prevent transmission
A Bayesian Phylogenetic Hidden Markov Model for B Cell Receptor Sequence Analysis
The human body is able to generate a diverse set of high affinity antibodies,
the soluble form of B cell receptors (BCRs), that bind to and neutralize
invading pathogens. The natural development of BCRs must be understood in order
to design vaccines for highly mutable pathogens such as influenza and HIV. BCR
diversity is induced by naturally occurring combinatorial "V(D)J"
rearrangement, mutation, and selection processes. Most current methods for BCR
sequence analysis focus on separately modeling the above processes. Statistical
phylogenetic methods are often used to model the mutational dynamics of BCR
sequence data, but these techniques do not consider all the complexities
associated with B cell diversification such as the V(D)J rearrangement process.
In particular, standard phylogenetic approaches assume the DNA bases of the
progenitor (or "naive") sequence arise independently and according to the same
distribution, ignoring the complexities of V(D)J rearrangement. In this paper,
we introduce a novel approach to Bayesian phylogenetic inference for BCR
sequences that is based on a phylogenetic hidden Markov model (phylo-HMM). This
technique not only integrates a naive rearrangement model with a phylogenetic
model for BCR sequence evolution but also naturally accounts for uncertainty in
all unobserved variables, including the phylogenetic tree, via posterior
distribution sampling.Comment: 26 page
Large-Scale B Cell Receptor Sequence Analysis Using Phylogenetics and Machine Learning
Thesis (Ph.D.)--University of Washington, 2019The adaptive immune system synthesizes antibodies, the soluble form of B cell receptors (BCRs), to bind to and neutralize pathogens that enter our body. B cells are able to generate a diverse set of high affinity antibodies through the affinity maturation process. During maturation, ``naive'' BCR sequences first accumulate mutations according to a neutral evolutionary process called somatic hypermutation (SHM), which may modify the associated binding affinities, and then are subject to natural selection by clonal expansion, which promotes the higher affinity antibodies. The set of mutated BCRs that result from a single naive BCR undergoing SHM can be referred to as a ``clonal family''. In my thesis, I study the mechanisms that govern the aforementioned evolutionary and selective processes of BCR sequences with the goal of better understanding how naive B cells diversify into mature B cells with high binding affinities. It is frequently important to infer the full evolutionary paths from a given naive BCR sequence to the corresponding mature BCR sequences in the clonal family. Stochastic mapping, a missing data imputation technique, can be used to estimate the mutational trajectories mentioned above; it is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. Current simulation-free algorithms can compute the mean but not any higher-order moments of the number of substitutions or of other stochastic mapping summaries; these algorithms scale linearly in the number of tips of the phylogenetic tree. I present the first simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. This procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. Before one can perform clonal lineage or ancestral sequence inference in a clonal family, one must first obtain an estimate of the clonal phylogenetic tree. Currently, standard phylogenetic inference techniques are used to model the SHM process; however, these methods do not account for all the complexities associated with this mutation process. I introduce a novel approach to inference that is based on a phylogenetic hidden Markov model (phylo-HMM). This technique is not only based on a more biologically realistic model of evolution but also designed to scale to the large datasets that result from high-throughput sequencing. In the antibody engineering field, researchers would like to infer the most likely per-site substitutions that are allowed in a clonal family. Unfortunately, many clonal families are small in size and do not have enough observed sequence information to accurately answer the preceding question. Despite this, there are structural properties associated with BCR sequences that are common across clonal families. I propose a penalized regression model that leverages aggregated amino acid count data (also known as ``substitution profiles'') in large clonal families to predict the substitution profiles in smaller clonal families. I show that there is information, possibly embedded through structural and functional constraints, contained within these large clonal families that can be shared with the smaller ones to enhance their substitution profile predictions. It is important to note that this regularized model assumes independence across sites, which is not a realistic assumption, so I consider extensions to models that account for coevolving sites
Recommended from our members
Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time.
Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences